3,802 research outputs found
These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure
K-mer abundance analysis is widely used for many purposes in nucleotide
sequence analysis, including data preprocessing for de novo assembly, repeat
detection, and sequencing coverage estimation. We present the khmer software
package for fast and memory efficient online counting of k-mers in sequencing
data sets. Unlike previous methods based on data structures such as hash
tables, suffix arrays, and trie structures, khmer relies entirely on a simple
probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits
online updating and retrieval of k-mer counts in memory which is necessary to
support online k-mer analysis algorithms. On sparse data sets this data
structure is considerably more memory efficient than any exact data structure.
In exchange, the use of a Count-Min Sketch introduces a systematic overcount
for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we
analyze the speed, the memory usage, and the miscount rate of khmer for
generating k-mer frequency distributions and retrieving k-mer counts for
individual k-mers. We also compare the performance of khmer to several other
k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC,
Turtle and KAnalyze. Finally, we examine the effectiveness of profiling
sequencing error, k-mer abundance trimming, and digital normalization of reads
in the context of high khmer false positive rates. khmer is implemented in C++
wrapped in a Python interface, offers a tested and robust API, and is freely
available under the BSD license at github.com/ged-lab/khmer
Recommended from our members
A tale of two niches: methods, concepts, and evolution
Being snapshots in time, species ranges may fall short of representing all of the geographic or environmental space that they are able to occupy. This has important implications for niche studies yet most comparative studies overlook the transient nature of species distributions and assume that they are at equilibrium. We review the methods most widely used for niche comparisons today and suggest a modified framework to describe and compare niches based on snapshot species range data. First, we introduce a new environmental space-based Niche Equivalence Statistic to test niche similarity between two species, which explicitly incorporates the spatial distribution of environments and their availability into statistical tests. We also introduce a new Background Statistic to measure the ability of this Niche Equivalence Statistic to detect differences based on the available environmental-space. These metrics enable fair comparisons between different geographies when the ranges of species are out of equilibrium. Based on distinct parameterizations of the new Equivalence and Background statistics, we then propose a Niche Divergence Test and a Niche Overlap Test, which allow assessment of whether differences between species emerge from true niche divergences. These methods are implemented in a new R package, ‘humboldt’ and applied to simulated species with pre-defined niches. The new methods improve accuracy of niche similarity and associated tests – consistently outperforming other tests. We show that the quantification of niche similarity should be performed only in environmental space, which is less sensitive than geographic space to the spatial abundance of key environmental variables. Further, our methods characterize the relationships between non-analogous and analogous climates in the species’ distributions, something not available previously. These improvements allow assessment of whether the different environmental spaces occupied by two taxa emerge from true niche evolution, as opposed to differences in life history and biological interactors, or differences in the variety and configuration of environments accessible to them
Flow Dynamics And Plasma Heating Of Spheromaks In SSX
We report several new experimental results related to flow dynamics and heating from single dipole-trapped spheromaks and spheromak merging studies at SSX. Single spheromaks (stabilized with a pair of external coils, see Brown, Phys. Plasmas 13 102503 (2006)) and merged FRC-like configurations (see Brown, Phys. Plasmas 13, 056503 (2006)) are trapped in our prolate (R = 0.2 m, L = 0.6 m) copper flux conserver. Local spheromak flow is studied with two Mach probes (r(1) = rho(i) ) calibrated by time-of-flight with a fast set of magnetic probes at the edge of the device. Both Mach probes feature six ion collectors housed in a boron nitride sheath. The larger Mach probe will ultimately be used in the MST reversed field pinch. Line averaged flow is measured by ion Doppler spectroscopy (IDS) at the midplane. The SSX IDS instrument measures with 1 mu s or better time resolution the width and Doppler shift of the C-III impurity (H plasma) 229.7 nm line to determine the temperature and line-averaged flow velocity (see Cothran, RSI 77, 063504 (2006)). We find axial flows up to 100 km/s during formation of the dipole trapped spheromak. Flow returns at the wall to form a large vortex. Recent high-resolution IDS velocity measurements during spheromak merging show bi-directional outflow jets at +/- 40 km/s (nearly the Alfven speed). We also measure T-i \u3e= 80 eV and T-e \u3e= 20 eV during spheromak merging events after all plasma facing surfaces are cleaned with helium glow discharge conditioning. Transient electron heating is inferred from bursts on a four-channel soft x-ray array. The spheromaks are also characterized by a suite of magnetic probe arrays for magnetic structure B(r,t), and interferometry for n(e) . Finally, we are designing a new oblate, trapezoidal flux conserver for FRC studies. Equilibrium and dynamical simulations suggest that a tilt-stable, oblate FRC can be formed by spheromak merging in the new flux conserver
Digital Weapons of Mass Destablization
In the coming decade, a global proliferation of networked technologies will widen the cyber threat landscape. Pairing new and unforeseen cyber vulnerabilities with weapons of mass destruction (WMD) increases the secondary threats that cyber attacks bring and also necessitates a shift in definitions. WMD will become weapons of mass destabilization, allowing adversaries to gain strategic advantage in novel ways.
Altering this definition provides clarity and specific actions that can be taken to disrupt, mitigate and recover from this combined threat. Additionally, a new class of Digital WMD (DWMD) will emerge, threatening military, government, and civilian targets worldwide. These combined and new threats will require the expansion of current defensive or mitigation activities, partnerships, and preparationhttps://digitalcommons.usmalibrary.org/aci_books/1035/thumbnail.jp
- …